Read This Before You Develop a Surrogate Simulation Model

Experts discuss tips and dilemmas in developing data-driven ROMs (reduced order models).

Experts discuss tips and dilemmas in developing data-driven ROMs (reduced order models).

Analysis of a bracket used to install an actuator mounted on a pin placed between the two holes in the bracket arms. The objective is the actuator’s horizontal misalignment, which should not be too large. An uncertainty quantification analysis reveals how sensitive the misalignment is to variations in geometric dimensions, such
as height and width, of the bracket. Image courtesy of COMSOL.


As AI-based simulation gains momentum, the use of reduced order models (ROMs), sometimes also called surrogate models, is garnering much more attention. In engineering workflows, their potential to cut down simulation time is a tantalizing factor.

In the broadest sense of the word, any simplified simulation model is a ROM or surrogate. The term is often used to describe simplification from machine learning and artificial intelligence training certainly falls under this umbrella, but they also encompass other methods, such as reducing the mesh count (turning a high-resolution model into a low-res one to save computing time), reducing a 3D problem into 2D (focusing on the cross-section of the model instead of the full model), and even back-of-the-envelope approximations.

Gavin Jones, principal application engineer, SmartUQ, points out, “For example, simplifying the physics in a very complicated computational fluid dynamics (CFD) model gives you a ROM. A data-driven method also gives you a ROM. When the data-driven method is employed, the ROM is generally referred to as a surrogate model. So, all surrogate models are ROMs, but all ROMs are not surrogate models.”

Developing data-driven ROMs or surrogate models is SmartUQ’s specialty. “Simulations can involve the need to solve large systems of differential equations, which takes lots of computational time. Once a surrogate model is trained however, predictions are made quickly without the need to solve such systems of equations,” Jones clarifies.

“[ROM building tools] are now built in, so it certainly makes it easier,” says Matthew Hancock, principal at engineering consulting firm Veryst Engineering. “It’s part of software like COMSOL Multiphysics and other simulation packages. Most or all large software companies are attempting this (COMSOL, Ansys, Siemens, Cadence, etc.), since everyone is asking for it.”

His colleague Sean Teller, principal at Veryst Engineering, notes, “The neural network libraries you need to develop ROMs are more readily available now. People are certainly talking a lot more about surrogate models. They are part of our own quiver of tools we use, but our own usage has not increased significantly.”

How Do ROMs Speed Up Simulation?

Suppose you have a series of archival simulation runs that reveal how the same valve design with different bend angles produces different water pressures. A data-driven approach allows you to identify the correlations between the bend angles and the output pressures, making it possible to bypass a full simulation run for the subsequent designs.

“The data is based on physics. So for a start, you need a series of physics-based simulations to train on,” says Yann Ravel, senior CFD engineer, Ansys.

In January 2024, Ansys launched Ansys SimAI, a cloud-enabled generative artificial intelligence platform. Ansys wrote, “Ansys SimAI uses the shape of a design itself as the input, facilitating broader design exploration even if the structure of the shape is inconsistent across the training data. The application can boost prediction of model performance across all design phases by 10 to 100 times for computation-heavy projects.”

SimAI is GPU-accelerated. To develop a ROM in SimAI, you would be asked to upload the boundary conditions of the simulation run, along with surface (with physics fields) or volume (with physics fields). The boundary conditions are not mandatory, clarifies Ravel. It uses the open-source VisionTools Pro format for input.

Treating Physics as a Blackbox

Most simulation programs specialize in specific types of physics: structural mechanics, electromagnetic, fluid dynamics, and so on. When running simulations, the software uses the known principles of these physics to calculate the displacement, deformation, pressure buildup, airflow, and other results. But the data-driven approach is quite different.

“[The AI] is not trying to understand the physics involved,” says Jones. “It’s simply trying to look at the correlations between the inputs and outputs.” The responsibility is on the user—the human domain expert—to recognize the limits of the training dataset and the anomalies in the ROM’s predictions. For example, seeing negative values where negative values are impossible in reality, or recognizing that the new manufacturing material used in the design is not part of the training dataset. Such signs point to the need to reassess the ROM’s reliability.

“In addition to tools for developing surrogate models, SmartUQ has tools that allow you to quantify the uncertainty of the models,” Jones points out. SmartUQ runs on Windows and Linux, and in the new release scheduled for December, the company plans to add GPU acceleration, according to Jones.

“The advantage of AI is, it can handle multiple parameters,” says Ravel. “Human engineers have to change one parameter at a time to figure out the impact of that change, but you can throw lots of inputs at the AI, and it can figure out the correlations.”

No Magic Number

Beginning users of AI-based simulation often confront the data-volume question: how many simulations is enough to develop a reliable ROM? Experts tend to refrain from specifying a number, because there are too many variables to consider.

The Ansys model evaluation report showing the effectiveness of the SimAI model trained on SUV data. Click here for full-size image. Image courtesy of Ansys.

One issue is the cost of the simulation run itself. For example, it requires much more time and effort to set up and simulate a car crash than it does to simulate a cellphone drop. Therefore, the number of simulations executed for AI training will vary from user to user, based on the target application.

“How long the simulation takes for someone to set up or how long the simulation takes to run affects how much training data a person is willing to collect,” Jones explains. “If someone is collecting less data simply because collecting the data is too time-consuming, the trade-off will be a lower accuracy surrogate model.”

Teller points out that to simultaneously perform multiple simulations to collect data, “You need software licenses in addition to hardware, and that can add significant cost for simultaneous simulations.”

Hancock adds, “You need to think outside the box—outside of your current dataset. If you’re going to use 10,000 simulations, use those that don’t look like one another, those that don’t have similar boundary conditions, to extend your design space.”

For Ravel, the small volume of data is not necessarily an issue; what’s more important is data consistency. “If you have only 20 simulation runs, then start your training with that. You know it won’t be the best, but it’s a learning process. You can also use the AI-derived model to see if you need more of a specific type of data to make it more accurate.” However, if the limited volume of data is your concern, Ravel says, “SimAI has tools that can help you auto-populate the data fields.”

Linear or Nonlinear

For someone starting from scratch, there’s something else to consider. “As you vary the inputs, if you see that the outputs also change linearly, then it takes the AI a lot fewer simulations to learn the correlations. So you don’t need a lot of simulations to train the AI. On the other hand, if the relationship is not linear, you would need a lot more simulations. But you won’t know this until you start running simulations,” Jones says.

If pressed, Jones says he would offer simple guidance. “We say a reasonable starting point is 10 samples per input dimension. So if you have six variables, then I’d say start with 60 simulation runs. But before I make that recommendation, I’d need to have a better understanding of your simulation problem, and how difficult or easy it is for you to collect data.”

“Your source simulation may have temperature, velocity, pressure, and so on, but you don’t want to overload the training model with all the data if you don’t have to,” Ravel says. “If you’re only interested in the forces that create pressure, then target them. You might develop one ROM for velocity prediction, one for pressure, but you don’t need to have one that predicts everything.”

In fact, for many simple linear problems, you may not need to bother with training and developing a surrogate model. An industry veteran or domain expert may be able to give you a back-of-the-envelope formula that works just as well.

“If you’re not getting enough flow in a pipe, double the inner diameter,” Hancock says. “It will reduce the flow resistance sixteen-fold. If you have a wide channel, doubling the channel height reduces the flow resistance eight-fold. We’re the subject matter experts; this is the bread-and-butter domain knowledge of fluid mechanics, so we know these things. Folks with expertise in different areas may not know this. But I think together is better. Combining a well-developed surrogate modeling tool with domain expertise gets you much further.”

ROMs should be constantly evolving, to incorporate new discoveries and to keep up with the finite element analysis and CFD physics they are trained on. “From time to time, the FEA or CFD software itself might improve, because the physics gets better. Then you may want to retrain your AI model,” notes Jones.

Double-Checking the ROM

Ravel has 16 years of CFD experience in F1 motorsport, working with brands like Ferrari and Renault. As part of the Ansys Customer Excellence team, Ravel also uses SimAI to develop proofs of concepts to share with customers.

“If you have 30 simulation runs in your database, SimAI would set aside 10%—by randomly selecting three of the simulations—and only use 27 for training,” Ravel explains. “Later, it would use those three that are outside the training data, to challenge or check the AI’s accuracy.”

Because the three simulations withheld were not part of the training, they could not have influenced the behavior of the ROM’s predictions. Ravel would later use the ROM to see if it could produce the same results seen in the three simulations set aside earlier. If it does, then there’s good reason to believe the ROM is reliable.

“Physics-based simulation mimics what happens in reality,” Ravel says. “A ROM cannot be more accurate than the physics-based simulation you use to train it. We might do less physics-based simulation if we have a ROM, but it’s not going to replace physics-based simulation.”

“Whether it’s a surrogate model, a ROM, or a full physics model, they all include a set of assumptions,” says Hancock. “So in a sense, all models are wrong, but some are very useful. You should always verify the model with something real. Make sure the model is accurately predicting what the product is doing.”

If the use of ROM or surrogate models become widespread, the human domain expert’s role becomes much more important, not less. “Make sure you understand the physics behind the model,” says Hancock.

More Ansys Coverage

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.


About the Author

Kenneth Wong's avatar
Kenneth Wong

Kenneth Wong is Digital Engineering’s resident blogger and senior editor. Email him at kennethwong@digitaleng.news or share your thoughts on this article at digitaleng.news/facebook.

      Follow DE
#29829